深度度量学习(DML)了解映射,该映射到嵌入空间,其中类似数据接近并且不同的数据远远。然而,DML的传统基于代理的损失有两个问题:渐变问题并使用多个本地中心应用现实世界数据集。此外,DML性能指标也有一些问题具有稳定性和灵活性。本文提出了多代理锚(MPA)丢失和归一化折扣累积增益(NDCG @ K)度量。本研究贡献了三个以下:(1)MPA损失能够使用多代理学习现实世界数据集。(2)MPA损失提高了神经网络的培训能力,解决了梯度问题。(3)NDCG @ K度量标准鼓励对各种数据集进行全面评估。最后,我们展示了MPA损失的有效性,MPA损失在两个用于细粒度图像的数据集上实现了最高准确性。
translated by 谷歌翻译
While human evaluation is the most reliable metric for evaluating speech generation systems, it is generally costly and time-consuming. Previous studies on automatic speech quality assessment address the problem by predicting human evaluation scores with machine learning models. However, they rely on supervised learning and thus suffer from high annotation costs and domain-shift problems. We propose SpeechLMScore, an unsupervised metric to evaluate generated speech using a speech-language model. SpeechLMScore computes the average log-probability of a speech signal by mapping it into discrete tokens and measures the average probability of generating the sequence of tokens. Therefore, it does not require human annotation and is a highly scalable framework. Evaluation results demonstrate that the proposed metric shows a promising correlation with human evaluation scores on different speech generation tasks including voice conversion, text-to-speech, and speech enhancement.
translated by 谷歌翻译